Model ensembles of ecosystem services fill global certainty and capacity gaps

Sustaining ecosystem services (ES) critical to human well-being is hindered by many practitioners lacking access to ES models (“the capacity gap”) or knowledge of the accuracy of available models (“the certainty gap”), especially in the world’s poorer regions. We developed ensembles of multiple models at an unprecedented global scale for five ES of high policy relevance. Ensembles were 2 to 14% more accurate than individual models. Ensemble accuracy was not correlated with proxies for research capacity, indicating that accuracy is distributed equitably across the globe and that countries less able to research ES suffer no accuracy penalty. By making these ES ensembles and associated accuracy estimates freely available, we provide globally consistent ES information that can support policy and decision-making in regions with low data availability or low capacity for implementing complex ES models. Thus, we hope to reduce the capacity and certainty gaps impeding local- to global-scale movement toward ES sustainability.


S-1-1. InVEST
InVEST is a suite of stand-alone and free models from the Natural Capital Project (24) that are downloaded as one package from the website (https://naturalcapitalproject.stanford.edu/software/invest). Extended descriptions of each model are provided in the online user guide (https://storage.googleapis.com/releases.naturalcapitalproject.org/investuserguide/latest/index.html#).
InVEST comprises of independent modules, each module covering one ecosystem service (ES). In this study we used three of the more widely used modules; the water yield module (28), the carbon module (59) and the recreation module (60) of release v3. 8.9, which was the current version at the time of conducting this part of our research in 2021. Two of the InVEST modules (water and carbon) do not contain autonomously drawn-in data sources. Instead, all data sets need to be provided manually. The output generated is at the grain equal of the provided land cover map. In contrast the recreation module is based on drawn-in data, defined by entered years, gridcell size (spatial resolution) and area.

InVEST water yield module
The InVEST water yield model is a process model, built as a hydropower module, identifying quantitatively how much water or economic value each part of the landscape contributes to hydropower production. This is done by estimating water run-off through a single point. The model has three components: water yield, water consumption, and hydropower valuation. We used the first component here, using the gridded outputs of water run-off per gridcell, allowing standardising extraction per validation polygon among model outputs and ensembles. The required entered watersheds were dummies, over which InVEST would sum separately, and are not used for any of our outputs.
As input data we used: ➢ Summed monthly precipitation using WorldClim version 2.1 at a 0.008333° resolution: (worldclim.org/worldclim21.html). ➢ Summed monthly potential global potential Evapotranspiration in mm from CGIAR-CSI Global Aridity index v2 on a 0.008333° resolution: (cgiarcsi.community/2019arity). ➢ Root restricting raster was obtained from the Harmonized World Soil Database v1. 2: webarchive.iiasa.ac.at/HWSD using the Reference soil depth. Subsequently, this polygon layer was converted to an exact 0.008333° grid. Eqn. S1 With plant available water (PWD) and field capacity (FC) following (61), listed in Table S1. The two layers were then averaged, weighted on their depths with 30 cm and [root restricting depth -30 cm]. Subsequently, this polygon layer was converted to an exact 0.008333° grid. ➢ Land cover followed MODIS MCD12Q1 v 5.1. land cover in 17 classes (44), resampled to a 0.008333° resolution based on dominant land cover. ➢ The maximum rooting depth and evapotranspiration coefficient (Kc), as look-up table per land cover class, were from (8).
The water yield module was run for 10 different seasonality factors (Z ; Table S2). In postprocessing, each cell was assigned the most appropriate single output for the cell's number of rain days. Here, the Z factor was based on [Z= 0.2 * N], where N is the average number of rain days per year (28). The number of rain days was calculated from summing the monthly days with rain per year from the FAO Wet Day Frequency per month (data.apps.fao.org/-map/raindays). Using rounded natural breaks on the number of rain days, the Z factors per cell were assigned following Table S2.  Table S2. Assignment of rain days per year ranges to InVEST seasonality (Z) factors.

InVEST carbon module
This InVEST module is a look-up table based model, which uses maps of land cover and data on wood harvest rates, harvested product degradation rates, and stocks in four carbon pools (aboveground biomass, belowground biomass, soil, dead organic matter) to estimate the amount of carbon currently stored in a landscape or the amount of carbon sequestered over time. We did not employ the sequestration functions and restricted ourselves to the above ground, standing carbon pool only to match our validation sets. The model generates gridded maps of standing carbon per land cover based on the carbon pools at the grain equal of the provided land cover map (EU-Copernicus GLC 2000 land cover). This land cover map was used for the IPCC standing carbon assessment tables in combination with ecofloristic zones (52).
As input data we used:

SI-1-3. Photosynthetic capacity extended Scholes model
The Scholes model (8,54) to estimate grazing density is a carrying capacity model, which calculates the service of forage production potential. As an intermediate product, the model calculates a water supply layer as number of growth days (GD below), which we use as one of our water supply models. Below, we summarise the model and our extension to make it a global model.
This model output is intended to show land areas in terms of their forage production potential and not necessarily as a guide to economically viable grazing capacities (8,54). It is based on the potential of the vegetation to feed animals locally and does not include any transport of resources to other areas. The model was developed for South Africa by (54) and has been re-interpreted in (8). We refer to the full model description of (8). Here, we extend the model by adding photosynthetic possibility by including relative monthly solar radiation.
The Scholes model used in this study follows the re-interpretation of (8), necessary due to some lack of clarity in the original model description (54). In this interpretation the model is a hierarchical correction model per gridcell of the potential Livestock Units (LSU) in a gridcell: With α a scaling parameter (set to 1), β = 0.24, x = a gridcell, Ax, Bx soil fertility parameters per gridcell, Cx slope parameter per cell, Dx land cover parameter per cell, GDx growth days (Eqn. S3). Extended denotes the proxy of photosynthetically active radiation.
1. The first innermost level ( 10 ( × 1 )) describes the intrinsic maximum forage potential capacity for a given climate, estimated by the annual number of days that rainfall exceeds evapotranspiration. This term is described as 'Growth Days' and is the Scholes water supply model in this study.
The principal control on rangeland forage production is the soil water balance. The productivity of grasses, shrubs and trees is strongly correlated with the quantity of water, which they transpire relative to the quantity which they could potentially transpire if the soil moisture supply was unlimited. This is calculated as the number of annual days rainfall exceeds evapotranspiration and is used as water supply model in this work.
The annual number of days rainfall exceeds evapotranspiration was calculated in monthly bins, this calculation is used as Scholes water supply model: Eqn. S3 With P = precipitation in gridcell x in month m; E = potential evapotranspiration in gridcell x in month m; m = month (1 to 12), x = 1 km 2 gridcell and d = number of days per month.
The original model is focused on South Africa, developed at Stellenbosch (South Africa, at -33.93 South, 18.86 East), assuming a default photosynthetic period and solar quantity. To extend this model to a global scale, this assumption needs to be altered. To do so, we added a further correction element to the main equation based on relative solar radiation quantity in monthly summed watts hour -1 m -2 as proxy of photosynthetically active radiation.
We used the SRTM 90m Digital Elevation Database v4.1. as source data, resampled to a 0.008333° resolution based on the mean elevation. From this DEM, we calculated the monthly solar sum per gridcell (Solx) using the ArcGIS area solar radiation tool, which was made relative to the annual solar estimate in Stellenbosch.
Following GD was corrected as: With Solx, the monthly solar total for gridcell x, GD(m,x) = ( * ( , ) ), see above, SolS equals the monthly total at Stellenbosch (i.e., this makes it relative to the original location). τ is the Stellenbosch correction factor which is the ratio of annual summed solar radiation at Stellenbosch compared to the annual maximum globally: τ equals 1.278. 3. The third level provides a correction for slope (C). Note that, being a correction model, C equals 1 in conditions not limited by slope. It is estimated no domesticated grazing animals are present above 35° degrees slopes (≈ 70%). The slope correction was calculated as: With Sx is slope in degrees, γ = 0.0323 and θ = 1.129. The slope was calculated using ArcGIS Surface-slope tool based on the SRTM 90m Digital Elevation Database v4.1, resampled to 250 meters (cgiarcsi.community/data/srtm-90m). Subsequently the resulting C-factor was averaged to an exact 0.0083333° resolution.
4. The fourth level provides a correction for land cover (D, again with {} brackets). Land cover has a large effect on the potential forage production; if covered with other vegetation types proportionally less forage production potential is present (54). For this correction layer we used MODIS MCD12Q1 v5.1. land cover in 17 classes (44), resampled based on dominance to a 0.008333° resolution. Since the original calculation in (54) was done on South African land cover maps, (8) made an interpretation for which Scholes class was most applicable per MODIS cover class, which we extended to non-binary here (see Table S6). Note that being a correction model, D equals 1 in default non-improved grass-and herblands.
5. For the parameters, α equals 1 . β is the transformation coefficient [page 6 in (54)] from number of growth days to mean annual precipitation, and is set to 0.24. Note that β was not included in our calculation of growth days as water supply model (Eqn. S3).

SI-1-4. Recreation and additional post-processing for ARIES and Co$ting Nature
Recreation is an ensemble with multiple different modelling methods. Two approaches are based on observations (Photo uploads: InVEST and Co$ting Nature), two approaches are based on human population movement through gravity functions (ARIES and Chaplin-Kramer), a third approach is the benefit transfer approach following TEEB.
All approaches predict recreational pressure, so are comparable, but contain a whole different set of assumptions, units and resulting output. Photo approaches are heavily biased to Europe, North America, and very localized inbound tourism hotspots, showing observed (proven) recreation potential. However, photo approaches largely fail to capture domestic recreation in most of the developing world, where photo uploads are less common. In addition, radiating gravity functions provide potential values for domestic tourism, assuming equal tourism opportunities and quality of infrastructure throughout (although this is partly corrected for in ARIES; see below). Benefit transfer approaches (i.e. TEEB) share a similar assumption (i.e. no variation with each land cover). Combining models with different assumptions, and therefore biases, likely maximises the portfolio effect (see Discussion). For example, the photo approaches strongly complement the population based gravity functions of potential recreational value; e.g. tourism in national parks in countries such as Kenya, Tanzania and South Africa are visible within the ensemble maps ( Figure S7) to a much higher degree than would be expected if calculated from population density alone. This is also apparent in Europe, where next to cities, mountain ranges highlight as top recreational spots ( Figure S8).
Note: The photo approach may be biased in urban areas (i.e. where the number of photos which have been taken in those regions may not indicate the number of people that are recreating in the open landscape enjoying the nature). We opted not to mask out urban settlements as by making the full coverage maps available, we provide the opportunity to the user to mask any area based on the research questions asked.
However, we call for the development additional recreation models. Because only five recreation models were available, the median ensemble shows sudden spatial transitions -e.g. estimates driven by continuous gravity models shifting to being more influenced by photo approaches in areas with low population densities (most apparent in South-America at the edge of the Amazon basin, Figure S8).
Co$ting Nature postprocessing Co$ting Nature was run in 10°-tiles that are each individually normalised by the framework. This implies that each tile contains the maximum value of 1 and all other gridcells are relative to that, independent of values in other tiles. Therefore tiles cannot be put together in a simple mosaic, but need recalibration to a common source that scales among tiles. For example, a Co$ting Nature carbon value of 1 in a certain tile might correspond to 50 tonnes, whereas in the neighbouring tile the value of 1 could correspond to 100 tonnes. In this two tile example, the first needs to be rescaled to 0.5 before the tiles can be combined.
We used normalised global-scale outputs from other models as comparators for rescaling Co$ting Nature. Creating a ArcGIS fishnet of 10°-tiles, the single maximum value within the tile location was taken from the comparator model output. Subsequently, this single value was multiplied with all Co$ting Nature values within the tile. For example, if the maximum of the comparison model for a certain 10°-tile was 0.76, all values of Co$ting Nature for that certain tile would be multiplied by 0.76. This would not alter the relative rankings of Co$ting Nature within tiles, but rescales all tiles to a common global source. By using a single maximum value per tile of the comparator model output, some roughness at edges is unavoidable but the additional spatial autocorrelation between these model outputs remains as limited as possible. To remove outliers, the comparators were normalised with a one-sided 95% winsorising protocol following (20) prior the fishnet procedure.
In detail per service: • Co$ting Nature water supply is not rescaled, as the model output is in physical units (m 3 ) of run-off per gridcell (i.e. not accumulated flow). • Co$ting Nature carbon was rescaled per 10°-tile against the globally normalised maximum value of a mean 4-way ensemble of the outputs of InVEST, ARIES, Global Forest Watch and Conservation International (Table 2). These models each provide full coverage of either biomass or carbon -which are identical in normalised form -at similar scales and without land cover masks applied for non-forest areas. • Co$ting Nature recreation was set as the sum of culture-based tourism and nature-based tourism photo index values (both ranged 0-1). These two Co$ting Nature outputs are generally mutually exclusive per grid cell. The few values above 1 were trimmed. Thereafter, the resulting combined output was rescaled per 10°-tile against the globally normalised maximum value of the InVEST recreation output, being a similar photo approach. • Co$ting nature fuelwood was rescaled per 10°-tile against the globally normalised maximum value of a mean 3-way ensemble of the outputs of InVEST, ARIES, and Conservation International (Table 2), all after woody mask appliance. These models each provide full coverage at similar scales. • Co$ting nature fodder production was rescaled per 10°-tile against the globally normalised maximum value of Gilbert et al. (43) (Table 2), which is providing Livestock Units. Since we use the in-built input data for Co$ting Nature, we minimised the time span of the independent model runs to reduce the likelihood of input updates. The majority (190) of Co$ting Nature tiles were downloaded between 7 th December 2020 and 29 th January 2021, with 6 tile replacements of compromised zipped files downloaded on 2 nd February 2021. As first assessment of feasibility, 10 European tiles were downloaded on 9 th and 10 th November 2020. The (paid) license provides access to all policysupport.org systems (Co$tingNature, WaterWorld) for application anywhere globally at 1km or 1ha spatial resolution. Our approach ensures that practitioners with resources to obtain the same license can replicate our methods.

ARIES recreation
For ARIES recreation, the access to nature predictions were run on a per country basis, with predictions indicating numbers of people normalised within countries -i.e., all ranged from 0-1 within countries. Assuming that the value of recreation is proportional to the wealth of countries, we multiplied with GDP per head per country [GDP per head x modelled value]. Which, after subsequent global normalisation (see below), provides a worldwide scaling of these per country values.

SI-2. Converting grazing animals numbers to LSUs
For gridded livestock predictions (43) we combined available livestock classes -buffalos, cattle, chickens, ducks, horses, goats, pigs & sheep -by converting their numbers per gridcell to FAO livestock units (LSU) following (64), split into tropical and non-tropical categories. This method is based on potential grazing biomass per animal compared to a fully-grown cow. Subsequently the LSU's were summed per gridcell. A list of conversion factors per livestock type is provided in Table S5.

SI-3. Forage production and fuelwood masks for standing carbon
Models that predict potential fuelwood or forage production are rare; Co$ting Nature being a notable exception by independently calculating both. Thus, following and extending (8), we built predictors using the outputs from applicable carbon models, with the requirement that they cover all ecosystems and not only forest, including mixed vegetation categories (Table S6). The rationale is that the supply of these two ES are directly dependent on the amount of biomass present, which is also what underpins estimates of stored vegetation carbon. We did so by using MODIS-based land cover [MCD12Q1 v 5.1.; (44)] masks for grassland vegetation for seven out of the 12 employed forage production models (see Table 2 main text) and woodland vegetation for seven out of nine fuelwood models. Due to the subsequent normalisation (see main text), there is no requirement to use any transition factors, since any constant will drop out with normalisation [such the 12.7% factor following (65)].
• For forage production, we applied a spatial mask to derive predominantly grassland carbon from standing carbon outputs. We excluded areas in which little to no forage production was expected (0 values in Table S6) and included areas in which most of the above-ground carbon is assumed to be available for forage production (positive values in Table S6). The resulting carbon layer was considered as available for forage production, equivalent to the layer of LPJ-GUESS's C3 and C4 grasses combined. In contrast to (8) we did not use a binary system but employed the Scholes grazing model land cover correction factors, correcting for areas in which less vegetation is present [see (8)]. • For fuelwood we used a spatial mask to derive woody carbon from standing carbon outputs in land cover categories that have a substantial amount of woody vegetation (Table S6). Therefore, this mask includes forest areas and closed shrublands. In tropical areas, we assumed that open shrublands and woody savannas could be used for fuelwood collection but with reduced per area factors compared to forest land cover, correcting for less woody vegetation present. For this we split the MODIS raster with the IPCC carbon zones (36), distinguishing between tropical carbon versus not tropical category zones. Similarly, we have included the Cropland/Natural Vegetation Mosaic with a partial per area ratio.   In cases where an overseas territories is represented separately in one of the validation data sets it was extracted as separate data point. We refer to all as 'countries', although we are aware not all have that status. Figure S3. Location of the validation plots for pan-tropical biomass in forest plots belonging to the reference data of (38) (N = 14,478) and the centroids from carbon estimates in UK forest estates taken from (20) (N = 1,606), which are combined in this paper. Inset enlargements are added to indicate densities. Statistical calculations include spatial autocorrelation to remove the effects of the clustered data. Figure S4. Standard error of the mean (SEM) of individual models correlates with deviance of the median ensemble from validation data (P<0.001). This is consistent across AG carbon (R 2 for single service: 0.28; P <0.001; grey); fuelwood (0.18; P <0.001; yellow), forage production (0.13; P <0.001; green), recreation (0.20; ; P <0.001; red), and water (0.06; P <0.05; blue) ecosystem services. Table S7. Two-tailed correlations as F-values with (effect size) and significance of inverse of deviance per validation datapoint -where increasing accuracy is represented by increasing inverse of the deviance. This analyses extends Table 2 (main text), testing for potential changed accuracy of the ensembles against characteristics indicating the research capability of countries for five services. Accuracy improvement is the mean of pairwise comparisons for 1000 bootstraps. We use a standardised maximum DF of 178 among services following a bootstrap convergence model where as many bootstrap are taken until mean SS for all factors among runs becomes fixed. Significance of the presented F-values were assessed taking account of multiple tests, using Hochberg's step-up correction with 8 tests per ES, which are interdependent because of sharing the same spatial autocorrelation term, all DF = 1. Characteristics are calculated independently as [Y ~ Spatial Autocorrelation + Characteristic + error], with Spatial Autocorrelation to a maximum of 5° between country centroids. An interaction model is added testing for interactions between GDP and income equality as [Y ~ Spatial Autocorrelation + GDP per capita + income equality + Interaction + error] with type III Sum of Squares. Water supply is per bespoke watershed, recreation, fuelwood and forage production per country.

Water Supply Recreation AG Carbon Fuelwood Production
Forage Production Accuracy Improvement (deviance) Ensemble vs. a random selected model (median among models 14% 6.1% 6.1% 3.4% 2.7%  Table S8. One-tailed correlations as F-values with (effect size) and significance of inverse of the relative ranking difference per validation datapoint ( ( ) ↓ ) ; the input to Spearman ρ). One-tailed tests were applied to test the hypothesis that the accuracy increases with higher values of each development/equality metric (two-tailed is presented in Table S9, including effect sizes). Whereas overall Spearman ρ is calculated using the Matlab standard function corr, the per validation point inverse of the ranking difference was calculated manually. This was done as:  Table S7 for further statistical details.

SI-7. Results with other ensemble approaches
In the main text we employ the median ensemble, for which the median value per cell or catchment polygon was used. Hooftman et al. (20) explored a wider variety of ensemble approaches, simulating alternative situations where reliable validation data are either present or lacking. Here we show the mean ensemble -i.e., taking the mean value per cell or catchment polygon. Generally the median ensemble shows a higher accuracy compared to the mean ensemble, which is more sensitive to outlier values (13,20). Furthermore, we show a selection of weighted approaches in which per cell the different models have a different product factor (see below). The weighted ensembles differ in how the weights are calculated. Two of the approaches shown are deterministic, which means that the result is an inherent property of the data set -i.e. the statistical outcome is identical given the same data set. Two approaches are iterative, which means that parameter space is step-wise systematically explored improving the maximum log-likelihood until `convergence is reached -i.e. no better solution is found. With iterative approaches, theoretically, different outcomes would be possible by redoing the calculation, however, the tools used are such that differences would be minimal. Both these deterministic and iterative approaches are based on searching for maximum consensus across input models. We also included an approach in which in which we penalised model outputs that are generated at coarser spatial resolutions (grain). See (20) for discussion of all approaches.
Weighted approaches often generate a good and reliable accuracy, with the disadvantage that they are more computationally intensive and are therefore not used in the main text. Weights in all cases were normalised to sum to 1 as ( ∑ ), with weights ωi for model i and n the total number of models per service. Due to bootstrapping among data-points, the resulting accuracy was different among runs, generating means and standard deviations.
Weighted approaches have the general form: with positive weights ωi for model i for cell/polygon x, weights ωi are normalised to sum to 1, Y the modelled values for i per cell/polygon, and n the total number of models per service.
Approaches to determine weights that are used here: 1) PCA as the consensus axis is a deterministic consensus approach. Principal components were calculated using the Matlab princomp-tool, the weights per model i outputted to the equation above were the loadings to the first -main -PCA axis. So models with the better correlation to the consensus axis are assigned higher weights. 2) The correlation coefficient method is our second deterministic consensus approach. Here we calculated the full [model × model] correlation matrix using the corrcoef-tool. The weight per model was the mean correlation of that individual model with all other models, not including itself. Hence, the higher general correlation to the other models, the more weight a model has. This technique was developed to have a second deterministic approach using a consensus axis different than the PCA and can be seen as further way to minimise variance among models. 3) Regression to the median is our first iterative approach using log-likelihood regression (31).
Using multivariate regression we assess weights such that the summed results maximises the explanation of an comparator. The resulting regression coefficients are used as weights and entered in the equation above. In this case the comparator is the median ensemble, asking which contribution of models would be most closely result to the median. 4) Leave-one-out cross-validation is our second iterative approach in which entire models are cross validated one-by-one. As for regression to the median this is done using a no constant multi-variation regression with the same nlmefit-tool as above, with the same settings and naïve priors. However, in this approach we loop through the model outputs. One-by-one, a regression is performed using a single model output as comparator and the remaining model outputs as explanatory variables. For model 1 such would be the regression representation [Y1 ~ ω2Y2 + ω3Y3 …. + ωnYn]. The regression coefficients (ωi) are stored as consensus weights. After looping through all models, the mean is taken of all regression coefficients per model as weights (excluding itself), i.e. this represents the weights that would generate the highest mean consensus with all models. These values are entered as weights in the equation above. 5) Models that are generated on smaller scales (i.e. with smaller gridcells) could be more accurate since the information per cell could better represent the local situation whereas larger gridcells could be more averaged across larger areas (8,13). Here, we penalised model outputs that are generated at coarser spatial resolutions (14). The grain size was taken from the original model outputs as downloaded or generated (see Table 2 main text), so before all post-processing. The weights taken were: ω i = 1 log 10 (grain i ) , for which the resulting weights were normalised afterwards summing to 1.
Below we present two accuracy metrics: 1) The inverse of deviance (D ↓ ) -as used in the main text -, ascertaining the absolute difference of each modelled value from its validation value using the inverse of the deviance. This is relevant where modelled values are important, e.g., when testing where ES levels exceed a minimum threshold. We used the inverse of the deviance so that, like ρ, a higher value indicated greater accuracy. The calculation follows ↓ = 1 − ( 1 × ∑ | ( ) − ( ) |) in which, n = the number of spatial data points, x a spatial data point, X(x) the normalised validation value for x, and Y(x) the normalised value for the model or ensemble tested. 2) Comparing the rank order of predicted and validation data using Spearman ρ using the Matlab corr tool. This is relevant where modelling is used to discover, for example, the most important locations for delivering an ES, or conversely, those areas whose development may have least impact on ES delivery. To avoid directional confusion, when these metrics are used per point (Table 1 main text, Tables S7-9), we include an inverse of the per data point value for both deviance and ρ, i.e., the [1-value] applies to the data points; whereas in the full data-set value above it is a post-summation inverse. We note that the Spearman ρ as calculated under (2) incorporates an similar internal reverse as the inverse of the deviance, after summation of individual data point (high ranking differences providing a low ρ value). Figure S15. Ecosystem service ensembles show an increased inverse of the deviance as measure of accuracy (D ↓ ) when compared to individual models with multiple ensemble methods from (20) for: a) water, b) recreation, c) AG carbon, d) fuelwood, e) forage production. Colours: significantly higher than mean accuracy of the models (blue), significantly lower than models (red), not significantly different from mean of the models (black) and D ↓ < 0.7 (dashed), a threshold for a 'good' explanation (8). Note meaning of black and dashed colouring slightly differ from the main text. Figure S16. Ecosystem service ensembles show an increased Spearman ρ (ranking correlation) as measure of accuracy when compared to individual models with multiple ensemble methods from (20) for: a) water, b) recreation, c) AG carbon, d) fuelwood, e) forage production. Colours: significantly higher than mean accuracy of the models (blue), significantly lower than models (red), not significantly different from mean of the models (black) and no significant ranking correlation (dashed black). †opposite relationship (negative ρ).